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(57) Abstract 

In a harmonic speech encoder (16) a speech signal to be encoded is represented by a plurality of LPC parameters which are 
determined by a LPC parameter computer (30), a pitch value and a gain value. The speech encoder comprises a (coarse) pitch estimator 
(38) for determining a coarse pitch, and a Refined Pitch Computer (32) to determine a Refined Pitch from the coarse pitch value. This 
determining of a refined pitch value is done in an analysis by synthesis way, in which a Refined Pitch value is selected which results in a 
minimum error measure between a representation of a synthetic speech signal and a representation of the original speech signal. 
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Transmitter with an improved harmonic speech encoder. 



The present invention is related to a transmitter with a speech encoder, said 
speech encoder comprises analysis means for determining a plurality of linear prediction 
coefficients from a speech signal, said analysis means comprises pitch determining means for 
determining a fundamental frequency of said speech signal, the analysis means further being 
5 arranged for determining an amplitude and a frequency of a plurality of harmonically related 
sinusoidal signals representing said speech signal from said plurality of linear prediction 
coefficients and said fundamental frequency. 

The present invention is also related to a speech encoder, a speech encoding 
method and a tangible medium comprising a computer program implementing said method. 
10 A transmitter according to the preamble is known from EP 259 950. 

Such transmitters and speech encoders are used in applications in which speech 
signals have to be transmitted over a transmission medium with a limited transmission capacity 
or have to be stored on storage media with a limited storage capacity. Examples of such 
applications are the transmission of speech signals over the Internet, the transmission of speech 
1 5 signals from a mobile phone to a base station and vice versa and storage of speech signals on a 
CD-ROM, in a solid state memory or on a hard disk drive. 

Different operating principles of speech encoders have been tried to achieve a 
reasonable speech quality at a modest bit rate. In one of these operating principles the speech 
signal is represented by a plurality of harmonically related sinusoidal signals. The transmitter 
20 comprises a speech encoder with analysis means for determining a pitch of the speech signal 
representing the fundamental frequency of said sinusoidal signals. The analysis means are also 
arranged for determining the amplitude of said plurality of sinusoidal signals. 

The amplitudes of said plurality of sinusoidal signals can be obtained by 
determining prediction coefficients, calculating a frequency spectrum from said prediction 
25 coefficients, and sampling said frequency spectrum with the pitch frequency. 

A problem with the known transmitters is that the quality of the reconstructed 
speech signal is lower than is expected. 
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An object of the present invention is to provide a transmitter according to the 
preamble which delivers an improved quality of the reconstructed speech. 

Therefor the transmitter according to the invention is characterized in that the 
analysis means comprise pitch tuning means for tuning the fundamental frequency of said 
5 plurality of harmonically related signals in order to minimize a measure between a representation 
of said speech signal and a representation of said plurality of harmonically related sinusoidal 
signals, the transmitter comprising transmit means for transmitting a representation of said 
amplitudes and said fundamental frequency. 

The present invention is based on the recognition that the combination of the 
10 amplitudes of the sinusoidal signals as determined by the analysis means and the pitch as 
determined by the pitch determining means do not constitute an optimal representation of the 
speech signal. By tuning the pitch in an analysis-by-synthesis like fashion it is possible to 
achieve an increased quality of the reconstructed speech signal without increasing the bit rate of 
the encoded speech signal. 
1 5 The "analysis-by-synthesis" can be performed by comparing the original 

speech signal with a speech signal reconstructed on basis of the amplitudes and the actual pitch 
value. It is also possible to determine the spectrum of the original speech signal and to compare it 
with a spectrum determined from the amplitude of the sinusoidal signals and the pitch value. 

An embodiment of the invention is characterized in that the determination of 
20 the amplitude and the frequency of a plurality of harmonically related speech signals is based on 
substantially unquantized prediction coefficients, in that the representation of said amplitudes 
comprises quantized prediction coefficients, and a gain factor which is determined on basis of the 
quantized prediction coefficients and said fundamental frequency. 

From experiments it became clear that performing the "analysis by synthesis" 
25 on basis of the quantized prediction coefficients caused undesired artifacts in the reconstructed 
speech. Subsequently performed experiments have shown that, by using the unquantized 
prediction coefficients in the "analysis by synthesis" and calculating the gain factor from the 
quantised prediction coefficient and the (refined) fundamental frequency, these artifacts can be 
avoided. 

30 An further embodiment of the invention is characterized in that the analysis 

means comprise initial pitch determining means for providing at least an initial pitch value for 
the pitch tuning means. 
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By using initial pitch determining means, it is possible to determine initial 
values for the analysis by synthesis lying close to the optimum pitch value. This will result in a 
decreased amount of computations required for finding said optimum pitch value. 

The present invention will now be explained with reference to the drawing 
5 figures. Herein shows: 

Fig. 1 , a transmission system in which the present invention can be used; 

Fig. 2, a speech encoder 4 according to the invention; 

Fig. 3, a voiced speech encoder 16 according to the present invention; 

Fig. 4, LPC computation means 30 for use in the voiced speech encoder 16 
10 according to Fig. 3; 

Fig. 5, pitch tuning means 32 for use in the speech encoder according to Fig. 3; 

Fig. 6, an speech encoder 14 for unvoiced speech, for use in the speech encoder 
according to Fig. 2; 

Fig. 7, a speech decoder 14 for use in the system according to Fig. 1; 
15 Fig. 8, a voiced speech decoder 94 for use in the speech decoder 14; 

Fig. 9, graphs of signals present at a number of points in the voiced speech 

decoder 94; 

Fig. 10, an unvoiced speech decoder 96 for use in the speech decoder 14. 

In the transmission system according to Fig. 1, a speech signal is applied to an 

20 input of a transmitter 2. In the transmitter 2, the speech signal is encoded in a speech encoder 4. 
The encoded speech signal at the output of the speech encoder 4 is passed to transmit means 6. 
The transmit means 6 are arranged for performing channel coding, interleaving and modulation 
of the coded speech signal. 

The output signal of the transmit means 6 is passed to the output of the 

25 transmitter, and is conveyed to a receiver 5 via a transmission medium 8. At the receiver 5, the 
output signal of the channel is passed to receive means 7. These receive means 7 provide RF 
processing, such as tuning and demodulation, de-interleaving (if applicable)and channel 
decoding. The output signal of the receive means 7 is passed to the speech decoder 9 which 
converts its input signal to a reconstructed speech signal. 

30 The input signal s s [n]of the speech encoder 4 according to Fig. 2, is filtered by 

a DC notch filter 10 to eliminate undesired DC offsets from the input. Said DC notch filter has a 
cut-off frequency (-3dB) of 1 5 Hz. The output signal of the DC notch filter 10 is applied to an 
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input of a buffer 11. The buffer 1 1 presents blocks of 400 DC filtered speech samples to a voiced 
speech encoder 1 6 according to the invention. Said block of 400 samples comprises 5 frames of 
10 ms of speech (each 80 samples). It comprises the frame presently to be encoded, two 
preceding and two subsequent frames. The buffer 1 1 presents in each frame interval the most 
5 recently received frame of 80 samples to an input of a 200 Hz high pass filter 12. The output of 
the high pass filter 12 is connected to an input of a unvoiced speech encoder 14 and to an input 
of a voiced/unvoiced detector 28. The high pass filter 12 provides blocks of 360 samples to the 
voiced/unvoiced detector 28 and blocks of 160 samples (if the speech encoder 4 operates in a 5.2 
kbit/sec mode) or 240 samples (if the speech encoder 4 operates in a 3.2 kbit/sec mode) to the 
10 unvoiced speech encoder 14. The relation between the different blocks of samples presented 
above and the output of the buffer 1 1 is presented in the table below. 



Element 


5.2 kbit/sec 


3.2kbit/s 




#samples 


start 


#samples 


start 


high pass filter 12 


80 


320 


80 


320 


voiced/unvoiced detector 28 


360 


0 - - • 40 


360 


0 • • • 40 


voiced speech encoder 1 6 


400 


0 


400 


0 


unvoiced speech encoder 14 


160 


120 


240 


120 


present frame to be encoded 


80 


160 


80 


160 



The voiced/unvoiced detector 28 determines whether the current frame 
15 comprises voiced or unvoiced speech, and presents the result as a voiced/unvoiced flag. This flag 
is passed to a multiplexer 22, to the unvoiced speech encoder 14 and the voiced speech encoder 
16. Dependent on the value of the voiced/unvoiced flag, the voiced speech encoder 16 or the 
unvoiced speech encoder 14 is activated. 

In the voiced speech encoder 16 the input signal is represented as a plurality of 
20 harmonically related sinusoidal signals. The output of the voiced speech encoder provides a pitch 
value, a gain value and a representation of 16 prediction parameters. The pitch value and the gain 
value are applied to corresponding inputs of a multiplexer 22. 

In the 5.2 kbit/sec mode the LPC computation is performed every 10 ms. In 
the 3.2 kbit/sec the LPC computation is performed every 20 ms, except when a transition 
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between unvoiced to voiced speech or vice versa takes place. If such a transition occurs, in the 
3.2 kbit/sec mode the LPC calculation is also performed every 10 msec. 

The LPC coefficients at the output of the voiced speech encoder are encoded by 
a Huffman encoder 24. The length of the Huffman encoded sequence is compared with the length 
5 of the corresponding input sequence by a comparator in the Huffman encoder 24. If the length of 
the Huffman encoded sequence is longer than the input sequence, it is decided to transmit the 
uncoded sequence. Otherwise it is decided to transmit the Huffman encoded sequence. Said 
decision is represented by a "Huffman bit" which is applied to a multiplexer 26 and to a 
multiplexer 22. The multiplexer 26 is arranged to pass the Huffman encoded sequence or the 
10 input sequence to the multiplexer 22 in dependence on the value of the "Huffman Bit". The use 
of the "Huffman bit" in combination with the multiplexer 26 has the advantage that it is ensured 
that the length of the representation of the prediction coefficients does not exceed a 
predetermined value. Without the use of the "Huffman bit" and the multiplexer 26 it could 
happen that the length of the Huffman encoded sequence exceeds the length of the input 
1 5 sequence in such an extent that the encoded sequence does not fit anymore in the transmit frame 
in which a limited number of bits are reserved for the transmission of the LPC coefficients. 

In the unvoiced speech encoder 14 a gain value and 6 prediction coefficients 
are determined to represent the unvoiced speech signal. The 6 LPC coefficients are encoded by a 
Huffman encoder 18 which presents at its output a Huffman encoded sequence and a "Huffman 
20 bit". The Huffman encoded sequence and the input sequence of the Huffman encoder 18 are 
applied to a multiplexer 20 which is controlled by the "Huffman bit". The operation of the 
combination of the Huffman encoder 18 and the multiplexer 20 is the same as the operation of 
the Huffman encoder 24 and the multiplexer 20. 

The output signal of the multiplexer 20 and the "Huffman bit" are applied to 
25 corresponding inputs of the multiplexer 22. The multiplexer 22 is arranged for selecting the 
encoded voiced speech signal or the encoded unvoiced speech signal, dependent on the decision 
of the voiced-unvoiced detector 28. At the output of the multiplexer 22 the encoded speech 
signal is available. 

In the voiced speech encoder 16 according to Fig. 3, the analysis means 
30 according to the invention are constituted by the LPC Parameter Computer 30, the Refined Pitch 
Computer 32 and the Pitch Estimator 38. The speech signal s[n] is applied to an input of the LPC 
Parameter Computer 30. The LPC Parameter Computer 30 determines the prediction coefficients 
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a[i], the quantized prediction coefficients aq[i] obtained after quantizing, coding and decoding 
a[i], and LPC codes C[i], in which i can have values from 0-15. 

The pitch determination means according to the inventive concept comprise 
initial pitch determining means, being here a pitch estimator 38, and pitch tuning means, being 
5 here a Pitch Range Computer 34 and a Refined Pitch Computer 32. The pitch estimator 38 
determines a coarse pitch value which is used in the pitch range computer 34 for determining the 
pitch values which are to be tried in the pitch tuning means further to be referred to as Refined 
Pitch Computer 32 for determining the final pitch value. The pitch estimator 38 provides a coarse 
pitch period expressed in a number of samples. The pitch values to be used in the Refined Pitch 
10 Computer 32 are determined by the pitch range computer 34 from the coarse pitch period 
according to the table below. 



Coarse pitch period p 


Frequency (Hz) 


Search Range 


step-size 


#candidates 


20 < p < 39 


400... 200 


p-3...p+3 


0.25 


24 


40 < p < 79 


200... 100 


p-2...p+2 


0.25 


16 


80 < p < 200 


100... 40 


P 


1 


1 



In the amplitude spectrum computer 36 a windowed speech signal S HAM is 
1 5 determined from the signal s[i] according to: 



S HAM[ i - 12 0] = w HAM[ i ]-s[i] (1) 

In (1) WhamW is equal to: 

W HAM = 054 - 0.46co S | 2;r((/ ^ " 1 2 °> } ;120</<280 (2) 



20 The windowed speech signal SH^fi] is transformed to the frequency domain 

using a 512 point FFT. The spectrum S w obtained by said transformation is equal to: 

S w [k]= ZsHAMtml-e-^ 512 < 3 > 

m=0 
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The amplitude spectrum to be used in the Refined Pitch Computer 32 is calculated according to: 

|s w [k]| = ^/(5R{S w [k]}) 2 +(3{S w [k]}) 2 ( 4 ) 

The Refined Pitch Computer 32 determines from the a-parameters provided by 
the LPC Parameter Computer 30 and the coarse pitch value a refined pitch value which results in 
a minimum error signal between the amplitude spectrum according to (4) and the amplitude 
5 spectrum of a signal comprising a plurality of harmonically related sinusoidal signals of which 
the amplitudes have been determined by sampling the LPC spectrum by said refined pitch period. 

In the gain computer 40 the optimum gain to match the target spectrum 
accurately is calculated from the spectrum of the re-synthesized speech signal using the 
quantized a- parameters, instead of using the non-quantized a-parameters as is done in the 
1 0 Refined Pitch Computer 32. 

At the output of the voiced speech encoder 40 the 16 LPC codes, the refined 
pitch and the gain calculated by the Gain Computer 40 are available. The operation of the LPC 
parameter computer 30 and the Refined Pitch Computer 32 are explained below in more detail. 

In the LPC computer 30 according to Fig. 4, a window operation is performed 
1 5 on the signal s[n] by a window processor 50. According to one aspect of the present invention, 
the analysis length is dependent on the value of the voiced/unvoiced flag. In the 5.2 kbit/sec 
mode, the LPC computation is performed every 10 msec. In the 3.2 kbit/sec mode, the LPC 
calculation is performed every 20 msec, except during transitions from voiced to unvoiced or 
vice versa. If such a transition is present, the LPC calculation is performed every 10 msec. 



20 In the following table the number of samples involved with the determination 

of the prediction coefficients are given. 



Bit Rate and Mode 


Analysis length N A and samples involved 


Update interval 


5.2 kbit/s 


160 (120-280) 


10 ms 


3.2 kbit/s (transition) 


160(120-280) 


10 ms 


3.2 kbit/s (no transition) 


240(120-360) 


20 ms 
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For the window in the 5.2 kbit/sec case and in the 3.2 kbit/s case where a 
transition is present, can be written: 

WHAM = 0.54 - 0.46cos{ 2?r((/ ^ " 120) | ;120£/<280 (*> 



For the windowed speech signal is found: 
shamP- 120] = wham P]-«P] ;l20<i<280 (6) 

If in the 3.2 kbit/s case no transition is present, a flat top portion of 80 samples 
is introduced in the middle of the window thereby extending the window to span 240 samples 
starting at sample 120 and ending before sample 360. In this way a window w'ham is obtained 
according to: 



w HAM 



whamH ;120<i<200 (7) 
1 ;200<i<280 
[whamW ;280<i<360 



10 for the windowed speech signal the following can be written. 

SHAM[i-120] = w HA Mp]'*] ;120<i<360 (8) 

The Autocorrelation Function Computer 58 determines the autocorrelation 
function R ss of the windowed speech signal. The number of correlation coefficients to be 
calculated is equal to the number of prediction coefficients + 1 . If a voiced speech frame is 
present, the number of autocorrelation coefficients to be calculated is 17. If an unvoiced speech 
1 5 frame is present, the number of autocorrelation coefficients to be calculated is 7. The presence of 
a voiced or unvoiced speech frame is signaled to the Autocorrelation Function Computer 58 by 
the voiced/unvoiced flag. 

The autocorrelation coefficients are windowed with a so-called lag-window in 
order to obtain some spectral smoothing of the spectrum represented by said autocorrelation 
20 coefficients. The smoothed autocorrelation coefficients p[i] are calculated according to : 
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p[ i] = R ss [i].exp(^) ;0si<P < 9 > 

In (9) f M is the spectral smoothing constant having a value of 46.4 Hz. The windowed 
autocorrelation values p[i]are passed to the Schur recursion module 62 which calculates the 
reflection coefficients k[l] to k[P] in a recursive way. The Schur recursion is well known to 
those skilled in the art. 

5 In a converter 66 the P reflection coefficients p[i] are transformed into a- 

parameters for use in the Refined Pitch Computer 32 in Fig. 3. In a quantizer 64 the reflection 
coefficients are converted into Log Area Ratios, and these Log Area Ratios are subsequently 

uniformly quantized. The resulting LPC codes C[l] C[P] are passed to the output of the 

LPC parameter computer for further transmission. 

10 In the local decoder 54 the LPC codes C[l] C[P] are converted into 

reconstructed reflection coefficients k[i] by a reflection coefficient reconstructor 54. 
Subsequently the reconstructed reflection coefficients k[i] are converted into (quantized) a- 
parameters by the Reflection Coefficient to a-parameter converter 56. 

This local decoding is performed in order to have the same a-parameters 
1 5 available in the speech encoder 4 and the speech decoder 14. 

In the Refined Pitch Computer 32 according to Fig. 5, a Pitch Frequency 
Candidate Selector 70 determines from the number of candidates, the start value and the step size 
as received from the Pitch Range Computer 34 the candidate pitch values to be used in the 
Refined Pitch Computer 32. For each of the candidates, the Pitch Frequency Candidate Selector 
20 70 determines a fundamental frequency f o i . 

Using the candidate frequency f 0 j the spectral envelope described by the LPC 
coefficients is sampled at harmonic locations by the Spectrum Envelope Sampler 72. For mj k 
being the amplitude of the k th harmonic of the i th candidate f 0)i can be written: 



m i,k = 

In (10), A(z) is equal to : 



1 



A(z) 



(10) 

z=27tk-f 0 j 
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— 1 —2 — P 

A(z) = l + aj - z +a2'Z +---+ap«z (11) 
With z = e^ 9l ' k = cos9 ik + j-sin0 ik and 9 i k = 27rkf 0 i (11) changes into: 
A(z)| e==e . k = 1 + a, (cos6 i)k + j ■ sin9 i>k ) + ••• + a P (cos9 P k + j • sin9 Pfk ) ( n ) 



By splitting (12) into real and imaginary parts, the amplitudes k can be 
obtained according to: 



m i,k = 



A /R 2 (9 u ) + I 2 (e i)k ) 



(13) 



5 where 



and 



R ( e i,k ) = 1 + a! (cosG u ) + -•• + a P (cosG iJc ) ( u ^ 



I (9i,k) = l + a 1 (sin9 i k ) + - * + a P (sin9 i k ) ( lg ) 



The candidate spectrum |S W) j| is determined by convolving the spectral lines 

m i k (l<k<L) with a spectral window function W which is the 8192 point FFT of the 160 points 
Hamming window according to (5) or (7), dependent on the current operating mode of the 
10 encoder. It is observed that the 8192 points FFT can be pre-calculated and that the result can be 
stored in ROM. In the convolving process a downsampling operation is performed because the 
candidate spectrum has to be compared with 256 points of the reference spectrum, making 

calculation of more than 256 points useless. Consequently for js w J can be written: 



|Sw,i[f]|=Zmi, k W(16 f-k f 0> i) ;0<f<256 ( 16 } 



L 

k=l 

Expression (16) gives only the general shape of the amplitude spectrum for 
15 pitch candidate i, but not its amplitude. Consequently the spectrum |§ w j has to be corrected by a 
gain factor gj which is calculated by a MSE-gain Calculator 78 according to: 
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256 ( 1? v 

Ss w [j]s w>i [j] 1 ; 

_ jz2 

Si ~ 256 

Z(s w [j]) 2 

j=0 

A multiplier 82 is arranged for scaling the spectrum S wi | with the gain factor 

g;. A subtracter 84 computes the difference between the coefficients of the target spectrum as 
determined by the Amplitude Spectrum Computer 36 and the output signal of the multiplier 82. 
Subsequently a summing squarer computes a squared error signal E ( according to: 

E, = E(f 0fi ) = l:(|s w [j]|-gi -IViU^) 2 ( 18 } 

j=o 

5 The candidate fundamental frequency, f 0 i that results in the minimum value is 

selected as the refined fundamental frequency or refined pitch. In the encoder according to the 
present example, a total of 368 pitch periods are possible requiring 9 bits for encoding. The pitch 
is updated every 10 msec independent of the mode of the speech encoder. In the gain calculator 
40 according to Fig. 3, the gain to be transmitted to the decoder is calculated in the same way as 

10 is described above with respect to the gain g ; , but now the quantized a-parameters are used 
instead of the unquantized a-parameters which are used when calculating the gain g s . The gain 
factor to be transmitted to the decoder is non-linearly quantized in 6 bits, such that for small 
values of g; small quantization steps are used, and for larger values of g t larger quantization steps 
are used. 

15 In the unvoiced speech encoder 14 according to Fig. 6, the operation of the 

LPC parameter computer 82 is similar to the operation of the LPC parameter computer 30 
according to Fig. 4. The LPC parameter computer 82 operates on the high pass filtered speech 
signal instead of on the original speech signal as in done by the LPC parameter computer 30. 
Further the prediction order of the LPC computer 82 is 6 instead of 16 as is used in the LPC 

20 parameter pitch computer 30. 

The time domain window processor 84 calculates a Hanning windowed speech 
signal according to: 
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s w [n] = 4»] (o-5- 0.5co{ 2 • + ^ " 120) J j ;120</<280 <«> 

In an RMS value computer 86 an average value g uv of the amplitude of a speech frame is 
calculated according to: 



lif 9 2 ri (20) 

1N i=0 



The gain factor g uv to be transmitted to the decoder is non-linearly quantized 
in 5 bits, such that for small values of g uv small quantization steps are used, and for larger values 
5 of g uv larger quantization steps are used. No excitation parameters are determined by the 
unvoiced speech encoder 14. 

In the speech decoder 14 according to Fig. 7, the Huffman encoded LPC codes 
and a voiced/unvoiced flag are applied to a Huffman decoder 90. The Huffman decoder 90 is 
arranged for decoding the Huffman encoded LPC codes according to the Huffman table used by 

10 the Huffman encoder 1 8 if the voiced/unvoiced flag indicates an unvoiced signal. The Huffman 
decoder 90 is arranged for decoding the Huffman encoded LPC codes according to the Huffman 
table used by the Huffman encoder 24 if the voiced/unvoiced flag indicates a voiced signal. In 
dependence on the value of the Huffman bit, the received LPC codes are decoded by the 
Huffman decoder 90 or passed directly to a demultiplexer 92. The gain value and the received 

1 5 refined pitch value are also passed to the demultiplexer 92. 

If the voiced/unvoiced flag indicates a voiced speech frame, the refined pitch, 
the gain and the 16 LPC codes are passed to a harmonic speech synthesizer 94. If the 
voiced/unvoiced flag indicates an unvoiced speech frame, the gain and the 6 LPC codes are 
passed to an unvoiced speech synthesizer 96. The synthesized voiced speech signal s v ^ [n] at the 
20 output of the harmonic speech synthesizer 94 and the synthesized unvoiced speech signal 

Suv,kt n ] at the output of the unvoiced speech synthesizer 96 are applied to corresponding inputs 
of a multiplexer 98. 

In the voiced mode, the multiplexer 98 passes the output signal s v k [n] of the 
Harmonic Speech Synthesizer 94 to the input of the Overlap and Add Synthesis block 100. In the 
25 unvoiced mode, the multiplexer 98 passes the output signal s uv k [n] of the Unvoiced Speech 
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Synthesizer 96 to the input of the Overlap and Add Synthesis block 100. In the Overlap and Add 
Synthesis block 100, partly overlapping voiced and unvoiced speech segments are added. For the 
output signal s[n] of the Overlap and Add Synthesis Block 100 can be written: 



s[n] = 



W-l[ n + N s /2 ] + § uv,k[n] ;v k _] =0,v k =0 
Suv,k-lt n + N s /2 3 + S v,k[n] ;v k _, =0,v k =1 
s Vf k-l[ n + N s /2 ] + s uv,k[n] ;v k _i =l,v k =0 
s v>k _,[n + N s /2]+s v?k [n] ;v k _! =l,v k =1 



In (21) N s is the length of the speech frame, vj^is the voiced/unvoiced flag for the previous 
5 speech frame, and v k is the voiced/unvoiced flag for the current speech frame. 

The output signal s[n] of the Overlap and Block is applied to a postfilter 102. 
The postfilter is arranged for enhancing the perceived speech quality by suppressing noise 
outside the formant regions. 

In the voiced speech decoder 94 according to Fig. 8, the encoded pitch received 
10 from the demultiplexer 92 is decoded and converted into a pitch period by a pitch decoder 104. 
The pitch period determined by the pitch decoder 104 is applied to an input of a phase 
synthesizer 106, to an input of a Harmonic Oscillator Bank 108 and to a first input of a LPC 
Spectrum Envelope Sampler 1 10. 

The LPC coefficients received from the demultiplexer 92 is decoded by the 
1 5 LPC decoder 1 12. The way of decoding the LPC coefficients depends on whether the current 
speech frame contains voiced or unvoiced speech. Therefore the voiced/unvoiced flag is applied 
to a second input of the LPC decoder 112. The LPC decoder passes the quantized a-parameters to 
a second input of the LPC Spectrum envelope sampler 1 10. The operation of the LPC Spectral 
Envelope Sampler 1 12 is described by (13), (14) and (15) because the same operation is 
20 performed in the Refined Pitch Computer 32. 

The phase synthesizer 106 is arranged to calculate the phase cp k [i]of the i th 
sinusoidal signal of the L signals representing the speech signal. The phase <p k [i] is chosen such 
that the i th sinusoidal signal remains continuous from one frame to a next frame. The voiced 
speech signal is synthesized by combining overlapping frames, each comprising 160 windowed 
25 samples. There is a 50% overlap between two adjacent frames as can be seen from graph 1 1 8 and 
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graph 122 in Fig. 9 . In graphs 118 and 122 the used window is shown in dashed lines. The phase 
synthesizer is now arranged to provide a continuous phase at the position where the overlap has 
its largest impact. With the window function used here this position is at sample 119. For the 
phase cp k [i] of the current frame can now be written: 

Vk[i] = q>k-i[i]+i-27c.f 0sk - 1 ^-i-27i-fo wk ^ ;l<i<100 ( 22 > 



5 In the currently described speech encoder the value of N s is equal to 160. For the very first 
voiced speech frame, the value of cp k [i] is initialized to a predetermined value. The phases 
cp k [i] are always updated, even if an unvoiced speech frame is received. In said case, 

fo 5 fc is set to 50 Hz. 

The harmonic oscillator bank 108 generates the plurality of harmonically 
10 related signals [n] that represents the speech signal. This calculation is performed using the 

harmonic amplitudes m[i] , the frequency f 0 and the synthesized phases 9 [i] according to: 

Sv,k[n] = Z A f i ]cos{(i-2n-f 0 )-n + 9[i]} ;0<n<N s ( 23 ) 

i = i 

The signal s^iJn] is windowed using a Harming window in the Time Domain Windowing 
block 1 14. This windowed signal is shown in graph 120 of Fig. 9. The signal s^k+ifn] is 
windowed using a Hanning window being N s / 2 samples shifted in time. This windowed signal 
1 5 is shown in graph 124 of Fig. 9. The output signals of the Time Domain Windowing Block 144 
is obtained by adding the above mentioned windowed signals. This output signal is shown in 
graph 126 of Fig. 9. A gain decoder 118 derives a gain value g v from its input signal, and the 
output signal of the Time Domain Windowing Block 1 14 is scaled by said gain factor g v by the 
Signal Scaling Block 1 16 in order to obtain the reconstructed voiced speech signal s v k . 

20 In the unvoiced speech synthesizer 96, the LPC codes and the voiced/unvoiced 

flag are applied to an LPC Decoder 130. The LPC decoder 130 provides a plurality of 6 a- 
parameters to an LPC Synthesis filter 134. An output of a Gaussian White-Noise Generator 132 
is connected to an input of the LPC synthesis filter 143. The output signal of the LPC synthesis 
filter 134 is windowed by a Hanning window in the Time Domain Windowing Block 140. 
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An Unvoiced Gain Decoder 136 derives a gain value g uv representing the 
desired energy of the present unvoiced frame. From this gain and the energy of the windowed 
signal, a scaling factor g' uv for the windowed speech signal gain is determined in order to obtain 
a speech signal with the correct energy. For this scaling factor can be written: 



g'uv = 



i 



guv (24) 



N s -1 



Z (§^ Vf k[n]-w[n]y 
n=0 



5 The Signal Scaling Block 142 determines the output signal s uv? k by multiplying the output 
signal of the time domain window block 140 by the scaling factor g' uv . 

The presently described speech encoding system can be modified to require a 
lower bitrate or a higher speech quality. An example of a speech encoding system requiring a 
lower bitrate is a 2kbit/sec encoding system. Such a system can be obtained by reducing the 

10 number of prediction coefficients used for voiced speech from 16 to 12, and by using differential 
encoding of the prediction coefficients, the gain and the refined pitch. Differential coding means 
that the date to be encoded is not encoded individually, but that only the difference between 
corresponding data from subsequent frames is transmitted. At a transition from voiced to 
unvoiced speech or vice versa, in the first new frame all coefficients are encoded individually in 

1 5 order to provide a starting value for the decoding. 

It is also possible to obtain a speech coder with an increased speech quality at a 
bit rate of 6kbit/s. The modifications are here the determination of the phase of the first 8 
harmonics of the plurality of harmonically related sinusoidal signals. The phase <p[i] is 
calculated according to: 

VL R(9i) 

20 Herein is Gj = 2n f 0 ■ i . R(8i) en 1(0;) are equal to: 

N_1 (26) 
ROi)= £ s w[n]-cos(ei-n) 

n=0 

and 
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N-l 

Kei) = -Z s w[ n ]' sin ( e i' n ) 

n=0 



(27) 



The 8 phases <p[i] obtained so are uniformly quantised to 6 bits and included in the output 



bit stream. 



A further modification in the 6 kbit/sec encoder is the transmission of 



additional gain values in the unvoiced mode. Normally every 2 msec a gain is transmitted instead 
5 of once per frame. In the first frame directly after a transition, 10 gain values are transmitted, 5 of 
them representing the current unvoiced frame, and 5 of them representing the previous voiced 
frame that is processed by the unvoiced speech encoder. The gains are determined from 4 msec 
overlapping windows. 



It is observed that the number of LPC coefficients is 12 and that where possible 



10 



differential encoding is utilized. 
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Claims 



1 . Transmitter with a speech encoder, said speech encoder comprises analysis 
means for determining a plurality of linear prediction coefficients from a speech signal, said 
analysis means comprises pitch determining means for determining a fundamental frequency of 
said speech signal, the analysis means further being arranged for determining an amplitude and a 

5 frequency of a plurality of harmonically related sinusoidal signals representing said speech signal 
from said plurality of linear prediction coefficients and said fundamental frequency, 
characterized in that the analysis means comprise pitch tuning means for tuning the fundamental 
frequency of said plurality of harmonically related signals in order to minimize a measure 
between a representation of said speech signal and a representation of said plurality of 

1 0 harmonically related sinusoidal signals, the transmitter comprising transmit means for 
transmitting a representation of said amplitudes and said fundamental frequency. 

2. Transmission system according to claim 1 , characterized in that the 
determination of the amplitude and the frequency of a plurality of harmonically related speech 

1 5 signals is based on substantially unquantized prediction coefficients, in that the representation of 
said amplitudes comprises quantized prediction coefficients, and a gain factor which is 
determined on basis of the quantized prediction coefficients and said fundamental frequency. 

3. Transmitter according to claim 1 or 2, characterized in that the analysis means 
20 comprise initial pitch determining means for providing at least an initial pitch value for the pitch 

tuning means. 

4. Transmitter according to one of the previous claims, characterized in that the 
speech encoder comprises spectrum analysis means for determining a frequency spectrum of the 

25 speech signal, and in that the pitch tuning means are arranged to minimize a difference between a 
spectrum derived from said amplitudes and fundamental frequency and the spectrum of the 
frequency spectrum of the speech signal. 
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5. Speech encoder comprising analysis means for determining a plurality of linear 
prediction coefficients from a speech signal, said analysis means comprises pitch determining 
means for determining a fundamental frequency of said speech signal, the analysis means further 

5 being arranged for determining an amplitude and a frequency of a plurality of harmonically 
related sinusoidal signals representing said speech signal from said plurality of linear prediction 
coefficients and said fundamental frequency, characterized in that the analysis means comprise 
pitch tuning means for tuning the fundamental frequency of said plurality of harmonically related 
signals in order to minimize an difference measure between a representation of said speech signal 
10 and a representation of said plurality of harmonically related sinusoidal signals, the transmitter 
comprising transmit means for transmitting a representation of said amplitudes and said 
fundamental frequency. 

6. Speech encoder according to claim 5, characterized in that the analysis means 

1 5 comprise initial pitch determining means for providing at least an initial pitch value for the pitch 
tuning means. 

7. Speech encoder according to claim 5 or 6, characterized in that the speech 
encoder comprises spectrum analysis means for determining a frequency spectrum of the speech 

20 signal, and in that the pitch tuning means are arranged to minimize a difference between a 
spectrum derived from said amplitudes and fundamental frequency and the spectrum of the 
frequency spectrum of the speech signal. 

8. Speech encoding method comprising determining a plurality of linear 

25 prediction coefficients from a speech signal, determining a fundamental frequency of said speech 
signal, determining an amplitude and a frequency of a plurality of harmonically related 
sinusoidal signals representing said speech signal from said plurality of linear prediction 
coefficients and said fundamental frequency, characterized in that the method comprises tuning 
the fundamental frequency of said plurality of harmonically related signals in order to minimize 

30 an difference measure between a representation of said speech signal and a representation of said 
plurality of harmonically related sinusoidal signals. 
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9. Method according to claim 8, characterized in that the method comprises 

providing at least an initial pitch value for the pitch tuning means. 



10. 



Method according to claim 8 or 9, characterized in that the method comprises 



5 determining a frequency spectrum of the speech signal, and in that the method comprises 
minimizing a difference between a spectrum derived from said amplitudes and fundamental 
frequency and the spectrum of the frequency spectrum of the speech signal. 



10 encoding method comprising, determining a plurality of linear prediction coefficients from a 
speech signal, determining a fundamental frequency of said speech signal, determining an 
amplitude and a frequency of a plurality of harmonically related sinusoidal signals representing 
said speech signal from said plurality of linear prediction coefficients and said fundamental 
frequency, characterized in that the method comprises tuning the fundamental frequency of said 

1 5 plurality of harmonically related signals in order to minimize an difference measure between a 
representation of said speech signal and a representation of said plurality of harmonically related 
sinusoidal signals. 



11. 



Tangible medium comprising a computer program for executing a speech 
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